Prefix table/ LPS in KMP algorithm and its applications 您所在的位置:网站首页 algorithm pattern generator Prefix table/ LPS in KMP algorithm and its applications

Prefix table/ LPS in KMP algorithm and its applications

2023-09-05 03:39| 来源: 网络整理| 查看: 265

In this article, we have presented the optimal linear time algorithm to compute the Prefix table/ LPS (Longest Prefix Suffix) which is used in KMP algorithm along with other key applications.

Table of contents:

Introduction to Prefix table/ LPS Optimal Algorithm for Prefix Table Example for Creating KMP Algorithm's LPS (Prefix Table) Implementations of Prefix table/ LPS Time & Space Complexity Applications of Prefix Table/ LPS

Let us get started with Prefix table/ LPS in KMP algorithm and its applications.

Introduction to Prefix table/ LPS

Prefix table (also known as LPS/ Longest Prefix Suffix) is an array data structure which captures the longest prefix which is also a suffix for every substring starting at index 0.

This data structure was first used in KMP algorithm which is used in find a pattern in a given set of strings. The algorithm compares character by character and uses the Prefix table/ LPS to make a decision to skip few characters when a mismatch occurs. Prefix table is computed as a part of pre-processing and the key is to compute it in linear time O(N) where N is the length of the pattern for which LPS is calculated.

Definition of LPS:

LPS = "Longest Proper Prefix which is also Suffix"

LPS[i] = MAXIMUM(j) such that string[0 to j-1] == string[i-j+1 to i]

that is: longest sub-string from index 0 which also, ends at index i.

This simple pattern stored in LPS can be used to accelerate several algorithm and also, enable a linear time string pattern matching algorithm known as KMP algorithm.

The brute force approach to compute values of LPS take O(N2) time. This is an overhead for most algorithms.

An optimal algorithm exists which can compute LPS in linear time O(N) and this makes the use of Prefix table LPS widespread.

Optimal Algorithm for Prefix Table

Define one dimentional array with the size equal to the length of the pattern (LPS[pattern.length])

Define variables i and j and set LPS[0]=0.

compare the characters in the pattern at the pattern[i] and pattern[j].

If both the characters matches then set LPS[j]=i+1 and increment both i and j values by 1. GOTO STEP 3

If both the characters do not match

check the value of i if its '0' then set LPS[j]=0 and increment 'j' value by one if its not '0' then set i=LPS[i-1]. GOTO STEP 3

Repeat Above steps untill all the characters of the pattern are traversed.

Example for Creating KMP Algorithm's LPS (Prefix Table)

Let the given pattern be

Index: 0 1 2 3 4 5 6 String: A B C D A B D

Let us define LPS[] array with size =length of pattern which is 7

Index: 0 1 2 3 4 5 6 LPS: step1=Define variables i & j. Set i=0,j=1 and LPS[0]=0. i=0 and j=1 Index: 0 1 2 3 4 5 6 LPS: 0 step2=Compare Pattern[i] and PAttern[j] .Since both are not matching and also "i==0" ,we need to set LPS[j]=0 and increment 'j' value by one. i=0 and j=2 Index: 0 1 2 3 4 5 6 LPS: 0 0 step3= Compare Pattern[i] and PAttern[j] .Since both are not matching and also "i==0" ,we need to set LPS[j]=0 and increment 'j' value by one. i=0 and j=3 Index: 0 1 2 3 4 5 6 LPS: 0 0 0 step4=Compare Pattern[i] and PAttern[j] .Since both are not matching and also "i==0" ,we need to set LPS[j]=0 and increment 'j' value by one. i=0 and j=4 Index: 0 1 2 3 4 5 6 LPS: 0 0 0 0 step5=Compare Pattern[i] and PAttern[j] .Since both are matching set LPS[j]=i+1 and increment 'j' and 'i' value by one. i=1 and j=5 Index: 0 1 2 3 4 5 6 LPS: 0 0 0 0 1 step6=Compare Pattern[i] and PAttern[j] .Since both are matching set LPS[j]=i+1 and increment 'j' and 'i' value by one. i=2 and j=6 Index: 0 1 2 3 4 5 6 LPS: 0 0 0 0 1 2 step7=Compare Pattern[i] and PAttern[j] .Since both are not matching and i!=0 ,we need to set i= LPS[i-1]===> LPS[2-1]=LPS[1]=0. i=0 and j=6 Index: 0 1 2 3 4 5 6 LPS: 0 0 0 0 1 2 step8=Compare Pattern[i] and PAttern[j] .Since both are not matching and also "i==0" ,we need to set LPS[j]=0 and increment 'j' value by one. Index: 0 1 2 3 4 5 6 LPS: 0 0 0 0 1 2 0

Here LPS[] is filled with all values so we stop the process .The final LPS[] table is as follows..

Index: 0 1 2 3 4 5 6 LPS: 0 0 0 0 1 2 0 Implementations of Prefix table/ LPS

Following is the implementation of our linear time algorithm for creating LPS array in C++:

vector prefixFunction(string s) { vector p(s.size()); int j = 0; for (int i = 1; i < (int)s.size(); i++) { while (j > 0 && s[j] != s[i]) j = p[j-1]; if (s[j] == s[i]) j++; p[i] = j; } return p; }

Following is the implementation of our linear time algorithm for creating LPS array in Python:

p='ababaca' l1 = len(p) j = 0 i = 1 prefix = [0] while len(prefix) < l1: if p[j] == p[i]: prefix.append(j+1) i += 1 j += 1 else: if j == 0: prefix.append(0) i += 1 if j != 0: j = prefix[j-1] print prefix

Following is the implementation of our linear time algorithm for creating LPS array in Java:

String pattern = "ababaca"; int i = 1, j = 0; int[] prefixArray = new int[pattern.length]; while (i < pattern.length) { while (pattern.charAt(i) != pattern.charAt(j) && j > 0) { j = prefixArray[j - 1]; } if (pattern.charAt(i) == pattern.charAt(j)) { prefixArray[i] = j + 1; i++; j++; } else { prefixArray[i] = j; i++; } } for (int k = 0; k < prefixArray.length; ++k) { System.out.println(prefixArray[k]); } Time & Space Complexity Best case time complexity: Θ(n) Space complexity: Θ(n) where n is the length of the string Applications of Prefix Table/ LPS

Applications of Prefix Table/ LPS are:

Search for a substring in a string. The Knuth-Morris-Pratt algorithm Counting the number of occurrences of each prefix The number of different substring in a string Compressing a string Building an automaton according to the prefix function Question What is the worst case time complexity of KMP algorithm for pattern searching (m = length of text, n = length of pattern)? O(m) O(n*m) O(n) O(logn) O(m) is the worst case as well.

With this article at OpenGenus, you must have a strong idea of how to calculate LPS array which is used in KMP algorithm efficiently.



【本文地址】

公司简介

联系我们

今日新闻

    推荐新闻

    专题文章
      CopyRight 2018-2019 实验室设备网 版权所有